Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Cell Rep ; 43(3): 113956, 2024 Mar 26.
Artigo em Inglês | MEDLINE | ID: mdl-38489267

RESUMO

Drugs of abuse can persistently change the reward circuit in ways that contribute to relapse behavior, partly via mechanisms that regulate chromatin structure and function. Nuclear orphan receptor subfamily4 groupA member2 (NR4A2, also known as NURR1) is an important effector of histone deacetylase 3 (HDAC3)-dependent mechanisms in persistent memory processes and is highly expressed in the medial habenula (MHb), a region that regulates nicotine-associated behaviors. Here, expressing the Nr4a2 dominant negative (Nurr2c) in the MHb blocks reinstatement of cocaine seeking in mice. We use single-nucleus transcriptomics to characterize the molecular cascade following Nr4a2 manipulation, revealing changes in transcriptional networks related to addiction, neuroplasticity, and GABAergic and glutamatergic signaling. The network controlled by NR4A2 is characterized using a transcription factor regulatory network inference algorithm. These results identify the MHb as a pivotal regulator of relapse behavior and demonstrate the importance of NR4A2 as a key mechanism driving the MHb component of relapse.


Assuntos
Cocaína , Habenula , Camundongos , Animais , Habenula/fisiologia , Cocaína/farmacologia , Memória , Regulação da Expressão Gênica , Recidiva
2.
bioRxiv ; 2024 Feb 29.
Artigo em Inglês | MEDLINE | ID: mdl-38464087

RESUMO

The gene expression profiles of distinct cell types reflect complex genomic interactions among multiple simultaneous biological processes within each cell that can be altered by disease progression as well as genetic background. The identification of these active cellular programs is an open challenge in the analysis of single-cell RNA-seq data. Latent Dirichlet Allocation (LDA) is a generative method used to identify recurring patterns in counts data, commonly referred to as topics that can be used to interpret the state of each cell. However, LDA's interpretability is hindered by several key factors including the hyperparameter selection of the number of topics as well as the variability in topic definitions due to random initialization. We developed Topyfic, a Reproducible LDA (rLDA) package, to accurately infer the identity and activity of cellular programs in single-cell data, providing insights into the relative contributions of each program in individual cells. We apply Topyfic to brain single-cell and single-nucleus datasets of two 5xFAD mouse models of Alzheimer's disease crossed with C57BL6/J or CAST/EiJ mice to identify distinct cell types and states in different cell types such as microglia. We find that 8-month 5xFAD/Cast F1 males show higher level of microglial activation than matching 5xFAD/BL6 F1 males, whereas female mice show similar levels of microglial activation. We show that regulatory genes such as TFs, microRNA host genes, and chromatin regulatory genes alone capture cell types and cell states. Our study highlights how topic modeling with a limited vocabulary of regulatory genes can identify gene expression programs in single-cell data in order to quantify similar and divergent cell states in distinct genotypes.

3.
bioRxiv ; 2023 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-37546854

RESUMO

The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. The consortium generated over 427 million long-read sequences from cDNA and direct RNA datasets, encompassing human, mouse, and manatee species, using different protocols and sequencing platforms. These data were utilized by developers to address challenges in transcript isoform detection and quantification, as well as de novo transcript isoform identification. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. When aiming to detect rare and novel transcripts or when using reference-free approaches, incorporating additional orthogonal data and replicate samples are advised. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.

4.
bioRxiv ; 2023 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-37546983

RESUMO

The pathogenesis of Alzheimer's disease (AD) depends on environmental and heritable factors, with remarkable differences evident between individuals at the molecular level. Here we present a transcriptomic survey of AD using spatial transcriptomics (ST) and single-nucleus RNA-seq in cortical samples from early-stage AD, late-stage AD, and AD in Down Syndrome (AD in DS) donors. Studying AD in DS provides an opportunity to enhance our understanding of the AD transcriptome, potentially bridging the gap between genetic mouse models and sporadic AD. Our analysis revealed spatial and cell-type specific changes in disease, with broad similarities in these changes between sAD and AD in DS. We performed additional ST experiments in a disease timecourse of 5xFAD and wildtype mice to facilitate cross-species comparisons. Finally, amyloid plaque and fibril imaging in the same tissue samples used for ST enabled us to directly link changes in gene expression with accumulation and spread of pathology.

5.
Cell Rep Methods ; 3(6): 100498, 2023 06 26.
Artigo em Inglês | MEDLINE | ID: mdl-37426759

RESUMO

Biological systems are immensely complex, organized into a multi-scale hierarchy of functional units based on tightly regulated interactions between distinct molecules, cells, organs, and organisms. While experimental methods enable transcriptome-wide measurements across millions of cells, popular bioinformatic tools do not support systems-level analysis. Here we present hdWGCNA, a comprehensive framework for analyzing co-expression networks in high-dimensional transcriptomics data such as single-cell and spatial RNA sequencing (RNA-seq). hdWGCNA provides functions for network inference, gene module identification, gene enrichment analysis, statistical tests, and data visualization. Beyond conventional single-cell RNA-seq, hdWGCNA is capable of performing isoform-level network analysis using long-read single-cell data. We showcase hdWGCNA using data from autism spectrum disorder and Alzheimer's disease brain samples, identifying disease-relevant co-expression network modules. hdWGCNA is directly compatible with Seurat, a widely used R package for single-cell and spatial transcriptomics analysis, and we demonstrate the scalability of hdWGCNA by analyzing a dataset containing nearly 1 million cells.


Assuntos
Doença de Alzheimer , Transtorno do Espectro Autista , Humanos , Transcriptoma/genética , Transtorno do Espectro Autista/genética , Perfilação da Expressão Gênica , Redes Reguladoras de Genes/genética , Doença de Alzheimer/genética
6.
Res Sq ; 2023 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-37503119

RESUMO

The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

7.
Genome Biol ; 24(1): 171, 2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37474948

RESUMO

Although long-read RNA-seq is increasingly applied to characterize full-length transcripts it can also enable detection of nucleotide variants, such as genetic mutations or RNA editing sites, which is significantly under-explored. Here, we present an in-depth study to detect and analyze RNA editing sites in long-read RNA-seq. Our new method, L-GIREMI, effectively handles sequencing errors and read biases. Applied to PacBio RNA-seq data, L-GIREMI affords a high accuracy in RNA editing identification. Additionally, our analysis uncovered novel insights about RNA editing occurrences in single molecules and double-stranded RNA structures. L-GIREMI provides a valuable means to study nucleotide variants in long-read RNA-seq.


Assuntos
Edição de RNA , Transcriptoma , RNA-Seq , Nucleotídeos , Análise de Sequência de RNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos
8.
bioRxiv ; 2023 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-37292896

RESUMO

The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3' end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3' processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection.

9.
bioRxiv ; 2023 May 04.
Artigo em Inglês | MEDLINE | ID: mdl-37205386

RESUMO

Pathogenic loss-of-function SCN1A variants cause a spectrum of seizure disorders. We previously identified variants in individuals with SCN1A -related epilepsy that fall in or near a poison exon (PE) in SCN1A intron 20 (20N). We hypothesized these variants lead to increased PE inclusion, which introduces a premature stop codon, and, therefore, reduced abundance of the full-length SCN1A transcript and Na v 1.1 protein. We used a splicing reporter assay to interrogate PE inclusion in HEK293T cells. In addition, we used patient-specific induced pluripotent stem cells (iPSCs) differentiated into neurons to quantify 20N inclusion by long and short-read sequencing and Na v 1.1 abundance by western blot. We performed RNA-antisense purification with mass spectrometry to identify RNA-binding proteins (RBPs) that could account for the aberrant PE splicing. We demonstrate that variants in/near 20N lead to increased 20N inclusion by long-read sequencing or splicing reporter assay and decreased Na v 1.1 abundance. We also identified 28 RBPs that differentially interact with variant constructs compared to wild-type, including SRSF1 and HNRNPL. We propose a model whereby 20N variants disrupt RBP binding to splicing enhancers (SRSF1) and suppressors (HNRNPL), to favor PE inclusion. Overall, we demonstrate that SCN1A 20N variants cause haploinsufficiency and SCN1A -related epilepsies. This work provides insights into the complex control of RBP-mediated PE alternative splicing, with broader implications for PE discovery and identification of pathogenic PE variants in other genetic conditions.

10.
bioRxiv ; 2023 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-37066421

RESUMO

The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the Homo sapiens and Mus musculus genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available via the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; https://openwdl.org/) is publicly available in GitHub, with images available on Dockerhub (https://hub.docker.com), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments via Cromwell. Access to the pipelines and data via the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.

11.
Genome Res ; 32(2): 389-402, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34949670

RESUMO

Accurate transcription start site (TSS) annotations are essential for understanding transcriptional regulation and its role in human disease. Gene collections such as GENCODE contain annotations for tens of thousands of TSSs, but not all of these annotations are experimentally validated nor do they contain information on cell type-specific usage. Therefore, we sought to generate a collection of experimentally validated TSSs by integrating RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression (RAMPAGE) data from 115 cell and tissue types, which resulted in a collection of approximately 50 thousand representative RAMPAGE peaks. These peaks are primarily proximal to GENCODE-annotated TSSs and are concordant with other transcription assays. Because RAMPAGE uses paired-end reads, we were then able to connect peaks to transcripts by analyzing the genomic positions of the 3' ends of read mates. Using this paired-end information, we classified the vast majority (37 thousand) of our RAMPAGE peaks as verified TSSs, updating TSS annotations for 20% of GENCODE genes. We also found that these updated TSS annotations are supported by epigenomic and other transcriptomic data sets. To show the utility of this RAMPAGE rPeak collection, we intersected it with the NHGRI/EBI genome-wide association study (GWAS) catalog and identified new candidate GWAS genes. Overall, our work shows the importance of integrating experimental data to further refine TSS annotations and provides a valuable resource for the biological community.


Assuntos
Regulação da Expressão Gênica , Estudo de Associação Genômica Ampla , Humanos , Regiões Promotoras Genéticas , Sítio de Iniciação de Transcrição
12.
Genome Biol ; 22(1): 286, 2021 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-34620214

RESUMO

The rise in throughput and quality of long-read sequencing should allow unambiguous identification of full-length transcript isoforms. However, its application to single-cell RNA-seq has been limited by throughput and expense. Here we develop and characterize long-read Split-seq (LR-Split-seq), which uses combinatorial barcoding to sequence single cells with long reads. Applied to the C2C12 myogenic system, LR-split-seq associates isoforms to cell types with relative economy and design flexibility. We find widespread evidence of changing isoform expression during differentiation including alternative transcription start sites (TSS) and/or alternative internal exon usage. LR-Split-seq provides an affordable method for identifying cluster-specific isoforms in single cells.


Assuntos
Isoformas de RNA/metabolismo , RNA-Seq/métodos , Análise de Célula Única/métodos , Animais , Diferenciação Celular/genética , Linhagem Celular , Núcleo Celular/genética , Cromatina/metabolismo , Genômica , Camundongos , Modelos Genéticos , Miogenina/genética , Fator de Transcrição PAX7/genética , Sítio de Iniciação de Transcrição , Transcrição Gênica
13.
Bioinformatics ; 37(9): 1322-1323, 2021 06 09.
Artigo em Inglês | MEDLINE | ID: mdl-32991665

RESUMO

MOTIVATION: Long-read RNA-sequencing technologies such as PacBio and Oxford Nanopore have discovered an explosion of new transcript isoforms that are difficult to visually analyze using currently available tools. We introduce the Swan Python library, which is designed to analyze and visualize transcript models. RESULTS: Swan finds 4909 differentially expressed transcripts between cell lines HepG2 and HFFc6, including 279 that are differentially expressed even though the parent gene is not. Additionally, Swan discovers 285 reproducible exon skipping and 47 intron retention events not recorded in the GENCODE v29 annotation. AVAILABILITY AND IMPLEMENTATION: The Swan library for Python 3 is available on PyPi at https://pypi.org/project/swan-vis/ and on GitHub at https://github.com/mortazavilab/swan_vis.


Assuntos
Anseriformes , Transcriptoma , Animais , Biblioteca Gênica , Análise de Sequência de RNA , Software
14.
RNA ; 25(12): 1793-1805, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31554659

RESUMO

Pre-mRNA splicing is regulated through multiple trans-acting splicing factors. These regulators interact with the pre-mRNA at intronic and exonic positions. Given that most exons are protein coding, the evolution of exons must be modulated by a combination of selective coding and splicing pressures. It has previously been demonstrated that selective splicing pressures are more easily deconvoluted when phylogenetic comparisons are made for exons of identical size, suggesting that exon size-filtered sequence alignments may improve identification of nucleotides evolved to mediate efficient exon ligation. To test this hypothesis, an exon size database was created, filtering 76 vertebrate sequence alignments based on exon size conservation. In addition to other genomic parameters, such as splice-site strength, gene position, or flanking intron length, this database permits the identification of exons that are size- and/or sequence-conserved. Highly size-conserved exons are always sequence-conserved. However, sequence conservation does not necessitate exon size conservation. Our analysis identified evolutionarily young exons and demonstrated that length conservation is a strong predictor of alternative splicing. A published data set of approximately 5000 exonic SNPs associated with disease was analyzed to test the hypothesis that exon size-filtered sequence comparisons increase detection of splice-altering nucleotides. Improved splice predictions could be achieved when mutations occur at the third codon position, especially when a mutation decreases exon inclusion efficiency. The results demonstrate that coding pressures dominate nucleotide composition at invariable codon positions and that exon size-filtered sequence alignments permit identification of splice-altering nucleotides at wobble positions.


Assuntos
Processamento Alternativo , Sequência de Bases , Sequência Conservada , Éxons , Humanos , Nucleotídeos , Filogenia , Polimorfismo de Nucleotídeo Único , Precursores de RNA/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA